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(54) DOCUMENT DATA PROCESSOR 
(57)Abstract: 

PROBLEM TO BE SOLVED: To provide a 
document data processor having improved 
processing efficiency in contrast to a conventional 
document data processor which has a problem 
that the processing efficiency is low since the 
processing of an HTML parser is peffomied after 
the processing of an XML parser is performed. 
SOLUTION: In this document data processor, a 
CPU 1 1 reads document data and performs the 
processing as the XML parser. At the time of 
detecting a tag which is not an XML tag during the 
processing of the XML parser, the start tag 
processing part or end tag processing part of the 
HTML parser is activated and the pertinent part is 

processed. Further, at the time of finding the tag related to CDATA or a pre-format, a 
corresponding processing is performed. 
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[18*^7 ] ig*3s ! ^•■66<?>c*rm^tciaaoc'.'$^ 

r-- la^ i - }i ccik ^ x^m. 

)\^-A^i>c^^^x^mti>m\'^t(Di^rtv^^^mi^icn 
^tf^20}\^-A^u:^'7Xi^^nfc&<^y:mr-'^o> 

30 f>'T^E<Z:'^^X^ f^-- tS? ^HUlaM 1 <D}V-}l(C^r>X^m 

<: t f ^ M 2 A - 'tr' ^ y ^. - . U i , 

40 [000 1] 

^I'CXML (Extensible Markup Languacje) t. HTM 
L <HvpsrT©(t Warkup Lanqaqg) iOC'JSJCT^tC^Jtnlgirjr 

[ 0 0 0 2 ] 

11i:$^*>^o HTMLli. SGML < Standard General 
50 iz<sd Markup Lanouags) > ^^^6§3!?:^itcm 
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[ 0 0 0 3 ] C OC'X M Lie cT: 0 lEii^ nA:*#^- 
\t. HTMLi|Sit|tcBRfi3nr«!!:S5n4<DT'4S^> 

[0004] tK.hX\ XML^SGMLfli. 
.'L/e^DTD (Data Tvpe Difini Tien) ^|&:^t/^Ci'>^ 

[0005] SGMLfli. -7-- ':?m<fjtc^ 20 

\t. Vkl^wct^^^W^\^xl^^t::^hx*>^\^l'b 

> 7^ > *> f -5b '|5-3>>^$(l!SiJ 1/ , -C- ^) Jif t ^ :3 > > 
iS-r ^(g^!^:?^^* ^m\^<. HTML o::'<P> ( ^ll<7>r?S 

[0 00 6] Ct^cclfc-^. XMLii. rr5fi^^?i^<lJ^7^" 
m. "T-'^t^^t (J>fAW<: S G M L J: 0 PI^Bg«:^«> 

i PCDATA < Parsed Character Data) ^? 
m%iyXm,f-'SO>:fi^^^&mthY{Ct^klk <Rapl 
actable Character Data) ^? 5''4> ^K-?- 4> Wjr 40 

i l/C S ^ C i 'C * -£> 0 D A T A < Charact- 
rOata) t<J>ZnWi'*hr,fci>'^. XMLT'li. PCDA 
TACC'ii^jO^Wwjrer^)^, CCf. CDATAt^, ^ 
,^^eniSiJ<^SB^'Cv< . JavaScript ^<:«^:^x 

[ 0 0 0 7 ] cc-fci!?). ^^^oy-x^f'-'^^m^mx' 
\t. xyiLtY{iuL(m^^^n^'.^^i>tcmc. xm 

L "f- ^? ^^Ajp 0 r H T M L r 1/ , vt iiriSa 
^fe© H T M L ^ - :^ ^ T ^> C i 4; U r C ' ^ . 
[0006] 50 
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I*. DTD€^:&5ra3i'^t6ni:c^;:ic^^o[>^^:^<. x 

t^oC'HTMLSlt'Cii. ^^T^? y3!>^^$nr<,»-^^<^> 
A^:^<. XML'"<--t?'5)s«oc'S«iES*c5Aiair^rJS:c*i 

[0 00 9] ^^tc, j^^cC'HTML:^#r-a?i<:(i. 
JavaScript (^:^c^^^<^<5CR!PT:-^? i?'^. 

:3>r>7i^^CDATAi0r^Ii$n&C<LA^fi 

XML'■^-1?*^:'li«<J^>l^J!iES^<:5Aa|'$'^ci7^^ 
[ 0 0 1 0 ] *^^i^±l^^^#cc.^K^•c^$n:^■c^><7> 

[ 0 Oil] 

i X M L CC J: D ^ n^-cgli^x#?^ - <7>:^rj: < i ^ 

iAgggr-^ o'C, ^;t^o:'SU<>x#?^~- iS? ^ h TMLS: 
^XMLASi \^x^mr^'Siint(OK>'^nip^'mm 

[00 12] Ccc:'XML-'<-if^^ii:^:0, XML^'^- 
'^mW^^'^^^^^ t flic HTML S|l<^C>xS ^> 

ti>^^t^-^i^. pi2XML^><-if^^7!>^, a«ias^?f 

^WthX\.^^. cntcJ:D. DTD;ta>!>^B9ii-:Jtt5 
n*C<.»^ir f-- ^> tc*| U r ^ r :? * .'H- 0>D T D «: 
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[00 13] 5*>tc, xuu<--^^mt. ^m^»it 
(0 0 1 4 ] $/c. ±.^^m(mn^.^mm^fc.^ 

1 0 0 16] ^tc. C CX-mi^^l^^ ^CCli, :i'^^< i 

[ 0 0 17]^ ±-u^m^0i0mm^^.^m^krhtc 
ia^^o>X^'^-'^^mmmf'Ci^l^x. $^tc. s^-i?^ 30 

i. ^ i/i)^^^$nxi^^'.^^'^-i)^mffiO. Mmmo 
^m^cm-r^i, ^x ^ i/i>^<m 5 nr t ^ ^ i * tcii. mm 
mtifc^ i:fm%ox^Mr--'s^^m^i>ct^^ 

ic^oxh±n^mm^\^^c tiy^x^i>o 
1 0 0 18] ±tdu^mm^f^.mik^i>tcit^€'tm 
m^ta^omm. xn^-'s^mmmxih^x. mi 

U^xB^^t\fc^^x^^-'>0'prj,< th-:^^^ 
^;^g.7:_^^ :^^m'^6X^T-^9!immMx$}<7 
X. ^m<J>mtMr-^mBW 1 OJi'-.«u<:f.$-7r 

iltm^^rc;(.'/rHtii2^i^><-if^l&^.^s^^^W^<i:. i 

[ 0 0 19] SO 
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i^mom^mM] '^^mo>^m>j[micr>K^xm^ 
^r-'ji 9m^mit. <- v ,-1. 2 > t ^ - x ^ ^) , 

0Ml2i. RAM13<L. Kf^" < ! r 

[ 0 0 2 0 ] C PU 1 n*. wm^k&k(^':^- h RO 
Ml S(Cte$^5tn:C^^^n^7.^.?:RAMl 3±(CP 

J:0. CPU 1 li*. ^>^- h"r 1 4CcfeS^^n;<i 
:t'^b-r^>i5^i/;Nr/K$:RAMl 3±CCD- KU. 

s^Lfg^rraj^-^^, •c-or. cpul li*, mmf^ieip 

'5'1f?:M-F7^"i;5.>? i4;0.'^RAMl 3±tCn-Kl/ 

[002 1 ] :r~hROMl 2^. CPU 1 l(0^m\t 
m^l'Cfmr6':ra i/v A^^mbXi.^i>o RAMI 3 

ii. cpul \<0'y'-i;M^^)tbxm%r^, />-k 

ri:?;i?14i*. CPUl lifi^mrr^^my^^y^ 
^rfeS^l/Tt^^, ^feCOC'^'^-- Kr < 1 4C*. CP 

ui iwmiC'm^j:^-^ {^;ic^*ir!ccget^'^nfe 

[0 02 2] I5ii. CPUl 

l^f^OC'rt^^rCPU 1 \ LAN-f>a?7* 
1 7li, LAN {LocalArsa Netft^iit) S'Ji'^^^ 
F^rsg^UTWebtf-^M'tca^sntr^aO. C 
PU 1 i^>^^=>A:^$n^lg^ccJ:♦^-^'.^ h^'-iy^/rb 
Xf^-'^^mt\y. ^fc. T^^v V':>-^^itOXl^\^^ 
^^-^^^r^fiurcpu 1 itcffiAj-r^o 

[ 0 0 2 3 ] JiailEtft^g 1 8 ii, :?P^ptf- (gSl^e 

iaisS($:^7!»6 7^- ^^^a (./-cc P u 1 1 e^ili:^^ 

CPUl H*, C<^51SPIEtft^^I 83t>^6ia-i5J^aiU 
fcf=-'^^>\~ K-f^^ Ai? I 4tciAJS:^a^^T'AiC/r 

[ 0 0 2 4] c cf'C PU 1 1 o>xmr-^'9m^ic-?i^ 
xmm^i>. *'mio>mBv:i^i>cp\j\ i^^^fg-?-^ 

XSf^- ^?5AfgC'fei6cr>r'7 O^yu ify Mt. 112 tC 
mt^,'^i>C^ TCP/ j P'/D h 2^l.fil?i5ff^2 1 X 
ML''^--X^2 2^, HTML'*^-XSll2 3i. 
1?'37^2 4 4;. *Sll^2 5i3!>>?E>m6£$4n:c^^o ^ 
A:. HTML^'<-X^2 3«, 7*^7 ^t--^-.' h 93^1^ 
3l<i:. CDATA^Aiig|i3 2i, Tf^^^^^ ^5AfBg(i3 3 
i. J^7^i?y$3}i^3 4<i;>0-'^E>1fi.^^nri^-2>, CC 
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[0 02 5] TCP/! P::^D h3il.^il^2 T 
CP/ j P'7"0 h:2MC<^:^X^^y h7-!?^mi^0X 10 

ccx\ Tcp/i pyuh^)imm2 imr\^^^ 
ccD03tcfe<,i^c. r<j t. r>j itr-e^nrc* 

[0 02 6] ^/c. HS^^k^Jinri^'til^A^ XMLtC 
[0027] CCtT', CPU ! l:i?>^TCP/! P:/P I 

:2}i^mm I 'CiiJiWcS:S'r-3?i<:^i/rtT^>XM 

L^' - A'^2 2 i OX0>9!^m^m 4 b X'^mt 

T^^'7^,'ihDTDiiO^*ficcgKE2n, ise^^nr 
i.^^iyO>tt^, CCX\ ^iV<y^-'>tit. usee 
mtc^;^tc, m9^'> i^mBm3 3^<0^^^>'^ (A) 
i. ^^7^?y^fi^3 4'^.0C•*'^>^^ <B) i. CDA 
TA^ii^3 2-N(Di<'^>a7 (C) <L. CDATAiC/ 40 
r53}i'^^*5?^?'«(^>£^J (D) i. r^U^^-^p-^h 

^S|i3 1 t:'^Ajg-r-< ^ ifZO>m^\ < F ) i , 

fJ^MEv^, lie <a) cc^T<^:^tc:, IHS']^ 
(H> < ! ) i. < J) fySfa^?5^ 

iSk-^f^ifch(DX'^i>o ^tc. ptimmv*. lie (b) 50 




4^32 0 0 1-32 5 24 8 

(0) m]±iMo>'^ iP) t^mi^r^^ifci>(DV 

It. si^jt<ft]6n^J:^«r. NULL'CS?Jjor:.^7'5: 
mt^CttbXK^^. 

[0028] CPUl Oct. XML''<-xm2 2(D^m 

TD^''\"^rf^i;^i' 14 iO-^^D-Ft*^ (S2)o -e 
UC. X#r-^?^!^i^<^ <S3) . 3^^r--^:5-^$^7 
Ufc:^:^i>-^i6:m^ {S4) . $?7l/CCini« (Yesrjr 

[0029] ^fc. ^mSAt^Cidi^X. ^^7UXi.^tj:ii 

6) . ^ms3icm^xsi^m^mii^ (a) , x 

*)^;0>S-:;:>^^|j5l-^ { S 7 ) . mt^'J^ TChtlit (Yes 

. ^mm^'J^ ^^(O^triS^^-tox'fy 
hDTD^^Bso, ll^^a5?^^5^•M^3 3--(^>;^<'^'>-^?^ 

mm 3 ^s^S&T^ {$8)0 CO^fS'i? 5^4Agg|i3 3 

[ 0 0 3 0 ] or . ^ i/^^g^^s 3 <;>4/Ljg;0iS.7 
r-Si. CPUl XML'»^--trcC'5>!:^«:Pirf§b. 
«A}iS 8 XmBbfcm^^ DATA i i./C^"ST 

l/(S9>> CDATA4: bX^mrT^B trC^tliX 
{Ye^Xhtiitt) . C D AT Am^mZ 2 ^t^mti> 
(S 1 0) o CPU 1 Ik*. CCF>CDATA5A}pg}i3 2 

m. CDATAtuxm^b. 

CDATAsai||^3 2<X:'«ri1f3!>^^TT^ 
t. CPUl «4MS30CM'?'CXML-M-— tfWA 

[0 03 1 )-:^^, ims9i>ci}K^x. cdata^o 
^um^'^ m-rfx^r? sAjir^gf 4? *fx^h 

'7•^:?>^-^■;^ i^f ^ni* KY ^ %uh 

. :71.'7*-vm h^}p^3 i^&lssif (SI 
2) ♦ 't-t/fCPU 1 li*. 7U7,r-"7-. K$?iis^3 
1 (PMmt l/C. SS^^i? ^^(^mf •5^^^7 
sn^fel, PCDATAiUt:;l;t&i^e<:iEiJDT-ssae'$: 




t t 
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'y. <hX. CPU 1 !v*. yi-yt-^v r^fi^3 
[0 03 2] 3 CPUl 1 U. ^l^S 7<<:4<$C^ 

[0033] ^tc^mSl^iCidl^X. it^J ^'Ctiii 

[0 03 4] ccx\ CPUl i^nf^m^-i^^ai^ 

3 3©ftfTCC':><,»rSft^T'So CPUl Hi. r?Sm^- ^ 

5A}iS|J3 3^ris*jASn:^c:J^#r-^?^5ll^^UT^^I^ 20 

[ 0 0 3 5] mef^^m4ic^.ufcltMr-'sv:^o^ 
CPUl Hi. mi^^irtw^mf&^^. trj.t> 

xrn^u. mt^^ f/^mmz 3<j>m^tox c(o r™ 30 

{ X ) . ^xcc 2 ttg(?>*.t€ADi>^ $ fsic ^? i^t L rag 

T^ic Theaoj cC'^^^iiMur (Y) > c:^ theadj 
^^^-^^^-^h, ^ir>^cz\mo> Ftitlej 4>0a«&^i^ 

f^T«U. 5l*^< fHCfcE P/MZJ ?:PCDATAil/C 

i^iiiith ( 2 > 0 

[0 03 6] l^/c. CC'C. CPUl imfY^l'^tf 
5Afgg|l3 4CD^mcoC^*C|lzi^f CPUl H*. 40 

xn-^-ft-A^^\^\yXK^im^t^^Witisxms^h. X 

M \^(rjim'M>m^^^ H T ML i SPtfrf -^^fS^^i 
^r. ^S^ieiS=&HTML*#CCIs:ftUt:RAMl3CC 

mB^tcig 4 S^iCJ^ii 7 tc4aoi:ii. f title j f|>^tcf^ 

^JD^tlT^^^PCDATA fWftE RMZJ <Z}<0?3^C' 
LE>J ^ r</TITL&.J $ W^*m3i r-TITL 50 
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HTMLxS<J:UrRAMl 3CCfeS«J'r^o -eO'C. 

Ttitlej i:*: i u^ji'±<i<?> f headj (Y) -^t^-Y^ 
^^ur. *?7 5?^55:isis&3 4<F>^ag<C'i&mtu*c^^ 

1 0 0 3 7 ] 5 ^tcC PU 1 1 li. XMLM--XS|i2 2 

(?>^s?!^Tr-SKt, "7'^*?if3rai24^«asi)iyi:, 

RAMI 3tcKS^$nA:HTML** <HTML^5;0'X 
M L c::'P.?S l//c^(t?!>^ ^ X M L > XSI12 2 JSCi^ HTM 
L > -iT'^ 2 3 cc'ftfcfl^oc c^; J? H T M L ic^s:?^ $ tifcy: 
S) ms^.O. <:(DHTMLx#tcg':Jt^rlSiig|i2 5 

t{yX<J>\}R L (Uhiforw Resource Locators) 
^U'C, r><:^.-i'{^'i\^t^^t^mthX(^-Xmr-- 

[0038] f^. c:c:5T'CC'jji^tc4ac^r(i:. f^y^}\^ 

HTML3. 2) Xh-7tct^^C<D^=ry ^ ilVDTZ^ 

^u-V'^h^^^i.chxh^'x^^ ^fc. m^^'s^9^m^ 

3Z<DmrtbX. r<*mL>J ^y?:^.ffiL/feiS^OC, 

^-nsr-tcDTDsi- < r<!DcaYPEj ) ^ij^fflorc^ 
[0 03 9] 5 a^^i^^nfc:3^^-7-^-i^Kjii/ 

n^DTDA^^a^i'^cci*. CPUl Hi, COC'DTD 
^^l' h D T Di<!;4>tCXML''^-Xg^2 2<?>4Afgi<:4Jl* 

[0 04 0] ^fc. mV-^^^-^'sVCiiSi^X. '^J'S^ 

m^tW^.'^^fxicm'^^Ckt. CPUl Hi, ^i^prSfi^^^ 
*1^3£(cii*0$ny<:^3£<7)l-?±{i<D 

■tSSWet:'*)^ i * tcji. ^? ^ ^-^^^SE MP T Y^'*^ 

< r/>j x%i^liri>'^ tfX'^h) i>\ ^jtEMPTYf 

^i^^|g^.Mt 4> 2 > r > tc^ * 'S'^fe^iC-^^ 

[0 04 1 ] ^/c, ^^l8§«Jg^i:'^^C^i5?:5^T:'$>'5><L^?^. 
EMPTYr''\i<, J^^'^gSM^*?^ :2 > '7 

[0 04 2] 



[0043]$ fc'^§miz ^M\t. 

mz] cpui \ im\^t^^v'^'^(K>y^ vox 
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* [114] CPUl !<DXML^»?-tril/ra)$3li^^ 

[0& ] m%^'^y^-^<o-m^^timmxhi,. 

[@6 1 Z^l b(K>-m^mtlAmm<:i>h^ 

11 CPU. 12 T'-hROM. 13 RAM. 1 

10 ^. 17 LAN-r^^p^r^x-x, IS H^taiS^ 
g. 2 1 TCP/! P:/a['2ilfi¥^fi-^, 22 XM 
LvN-X^, 2aHTML^'>-xaB. 2 4 'T^'^^^^ 
rg&. 2 5 SiS^. 3 1 7^7*-^-:^ r^E^, 
32 CDATASAM^. 33 fel^^ 3? i?'4Aai*ii. 34 
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<C£WTCR> 

041) wcfcometoxxxxxxxx </H1> 
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